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LAW OF THE ITERATED LOGARITHM FOR [/-STATISTICS OF 
WEAKLY DEPENDENT OBSERVATIONS 

HEROLD DEHLING AND MARTIN WENDLER 

Abstract. The law of the iterated logarithm for partial sums of weakly dependent pro- 
cesses was intensively studied by Walter Philipp in the late 1960s and 1970s. In this paper, 
we aim to extend these results to nondegenerate [/-statistics of data that are strongly mixing 
or functionals of an absolutely regular process. 

Dedicated to the memory of Professor Walter Philipp (1936-2006) 

1. Introduction 

Let (T n ) n >x be a sequence of random variables. We say that (T re ) n > x satisfies the law of 
the iterated logarithm (LIL), if Var(T n ) > for almost all n > 1 and 

T n 

lim sup = = 1, 

n->oo y 2 Var (T n ) log log Var (T n ) 

T n 

lim inf — = = — 1 

V 2Var (^n)loglogVar(T n ) 

almost surely (a.s.). The LIL was originally established for partial sums of independent 
identically distributed random variables by Khintchine in 1927 [22]. Hartman & Wintner [16] 
were able to prove Khintchine's result under the optimal condition that the random variables 
have mean zero and finite second moments. Together with the law of large numbers and the 
central limit theorem, the LIL is considered as one of the three classical limit theorems in 
probability theory. 

In a series of papers, starting in 1967 ([25], [26], [27], [30]), Walter Philipp investigated 
the LIL for partial sums of weakly dependent processes. Independently, Iosifescu (1968 
[19]) and Reznick (1968 [32]) studied the same problem; Oodaira & Yoshihara (1971 [24]) 
weakened their conditions. In [25] Walter Philipp studied the LIL for stationary processes 
with finite moments of all order satisfying some multiple mixing condition. In his proof 
Walter Philipp established sharp bounds on the (2p)-th moments of partial sums and classical 
techniques such as the Borel-Cantelli lemma and maximal inequalities. In [27] Walter Philipp 
investigated the LIL for -^-mixing processes with finite 4-th moment. The proof is based on 
a meta-theorem, stating that 'the LIL holds for any process for which the Borel-Cantelli 
lemma, the central limit theorem with a reasonably good remainder and a certain maximal 
inequality are valid.' This observation provided a guiding principle for many of the early 
proofs of the LIL for dependent processes. 

Walter Philipp's interest in dependent processes arose from specific applications to anal- 
ysis and probabilistic number theory. In all of his works, Walter Philipp had very concrete 
applications in mind to which he could apply his theoretical results. In a joint paper with 
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Stackelberg [30] , Walter Philipp established the LIL for the denominator of the n-th approx- 
imand in the continued fraction expansion. The relation to weakly dependent processes is 
provided by the fact that the digits in the continued fraction expansion form a ^-mixing 
sequence. In [26], Walter Philipp investigated dynamical systems arising from expanding 
piecewise linear transformations of the unit interval; the map T(x) = 2x[ mod 1] being a 
special example. These processes can be shown to have a representation as functionals of an 
absolutely regular process. 

In [26], Walter Philipp considered the uniform LIL, i.e. the LIL for the supremum of 
partial sums of f{Xj) — E(f(Xi)), where / ranges over a class of functions. As an example, 
Walter Philipp could study the discrepancy of sequences arising from expanding piecewise 
linear maps. This paper marked the beginning of Walter Philipp's interest in the LIL for 
empirical processes and for Banach space valued processes. In [28], Walter Philipp proved 
a Strassen-type functional LIL for the empirical process of data that have a representation 
as a functional of a strongly mixing process. In a joint paper with Kaufman [21], Walter 
Philipp studied uniform LIL for classes of Lipschitz functions, among others for processes 
of the form X& = {rikUj}, uj G [0, 1], where (rik)k>i is a lacunary sequence. The study of 
the uniform LIL leads directly to Banach space valued random variables. The first LIL for 
weakly dependent Banach space valued processes was proved by Philipp & Kuelbs [23] in the 
case of uniformly mixing processes. Specializing to the case of Hilbert space valued random 
variables, Dehling & Philipp [8] extended this to strongly mixing processes. 

In the early 1970s, motivated by Strassen's proof of the functional LIL, Walter Philipp 
realized that almost sure invariance principles were ideal tools for proofs of the LIL. In 
1974, in an AMS memoir coauthored with Stout [31], Walter Philipp established almost sure 
invariance principles for a large class of weakly dependent processes, including functionals of 
absolutely regular processes. Philipp & Stout were among the first to recognize the power of 
the martingale approximation technique, invented in 1969 by Gordin [14]. Finally, in their 
seminal 1979 paper [3], Berkes & Philipp invented a new technique for proving almost sure 
invariance principles that can be used also for vector valued processes. The Berkes-Philipp 
approximation technique has been the basis of most work on invariance principles and the 
LIL in the following decades. For an excellent survey on invariance principles see Philipp 
[29]. 

Many other authors have considered the LIL for partial sums of weakly dependent pro- 
cesses. Berkes (1975 [2]) treats the LIL for trigonometric functions, Dabrowski (1985 [6]) 
establishes the LIL for associated random variables, Dabrowski & Dehling (1988 [7]) extended 
this to weakly associated random vectors. For partial sums of strongly mixing processes, the 
sharpest results presently available are due to Rio (1995 [33]). 

In the present paper, we investigate the LIL for bivariate [/-statistics of weakly dependent 
data. Given a symmetric, measurable function h : R 2 — > R and a stationary stochastic 
process, we define the [/-statistic with kernel h by 

\2) l<i<j<n 

Thus, U n (h) is the arithmetic mean of the values h(X{, Xj), 1 < i < j < n, and in that sense 
[/-statistics are generalized means. Many sample statistics can be written as a [/-statistic, 
at least asymptotically, and thus [/-statistics are very important in statistical theory. U- 
statistics have been introduced independently by Halmos (1946 [15]) and Hoeffding (1948 
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[17]), in the case of i.i.d. observations. Halmos observed that U n (h) is an unbiased estimator 
of Eh(X 1 , X 2 ), and in fact the minimum variance unbiased estimator in nonparametric 
models. Hoeffding showed that U n (h) is asymptotically normal. 

Example 1.1. Let h (21,22) = \%i — z^l • Then the corresponding U-statistic is 

u n {h)= 2 Yl i x >-^i> 

n(n — 1 / — ' 

i<«<i<" 

known as Gini's mean difference. 

Example 1.2. Let h (21,22) = Jq (^{ Xl <t} — t) (^{x 2 <t} — t) dt. This leads to the following 
U-statistic: 

Un(h) = -^ Yl h(X h Xj) 

V2/ \<i<j< n 

/ n n „i n 

= EE/ ft**) " *) (Mx^ty ~t)dt-J2h (X, X t ) 

\ 1 \ i=i j=i Jo i=i 

fi 2 1 71 

= ^T/ (*)-*) ttE^^'^) 

n-lj V / n [n — 1) *ri 



n 

■v, 



n — 1 n n (n — 1) 

v ' i=i 

V n is called Cramer-von Mises-Statistik and can be used for testing the hypothesis that X n 
has a uniform distribution on [0, 1] as an alternative to the Kolmogorow- Smirnoff- statistic 
K n := sup tg [ 01 ] \F n (t) — t\ (also called discrepancy). 

Example 1.3. Let be t G R and h(xi,X2) = l^l( xl +s 2 )<tV This kernel is related to the 
Hodges- Lehmann- estimator 



1 J2h(X t ,X t ) 



(Xi + Xj, 1 
H n = median < — - \ l<i<j<n> 



as we will see later. 



The key tool in the analysis of [/-statistics is the Hoeffding decomposition, introduced 
originally by Hoeffding (1948), 

2 n 

U n {h) = 6 + -Y j h l {X l ) + U n {h 2 ) 
i=i 

Here, 9, hi(x) and Ii2(x,y) are defined by 

9 := Eh(X,Y) 

hx(x) := Eh(x,Y)-6 

h 2 (x, y) := h(x, y) - h 1 (x) - h x {y) - 9, 

where X, Y are independent random variables with the same distribution as X±. The lin- 
ear term in the Hoeffding decomposition, - Y2i=i hi(Xi), can be treated by standard limit 
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theorems for partial sum processes. Note that, by definition, hi(Xi) are centered (i.e. mean 
zero) random variables. The kernel h 2 (x,y) has the property that for every x G R 

Eh 2 (x,Y) = 0; 

kernels with this property are called degenerate. It turns out that U n (h 2 ) is generally stochas- 
tically dominated by the linear term, and thus as a result the asymptotic behavior of U n (h) 
is the same as that of - J^ =1 ^i(^i)- Depending on the type of limit theorem and the 
conditions imposed on the process (JQ)j>i, this can be more or less difficult to establish. 

For degenerate [/-statistics of i.i.d. observations, Dehling, Denker and Philipp (1985 [10]) 
and Dehling (1989 [9]) established the LIL. They could show that 

lim sup - 1 y~] h 2 {Xi , Xj) = c h2 , 

— nl °S l0 S n i<fen 

where c^ 2 is the largest eigenvalue of the integral operator with kernel h 2 . This was extended 
to mixing random variables by Kanagawa and Yoshihara [20] under the condition that the 
eigenvalues of h 2 decreas quickly, that is hard to verify in practice. 

Recall that strong mixing coefficients of a stationary stochastic process (X n ) n£ -^ are defined 

by 

a(k) := sup {\P(A HB)- P(A)P(B) \:AeJ^,Be F™ +k , n G IN} 

where T l a denotes the a— field generated by the random variables X a , . . . , Xi.. For a detailed 
description of the various mixing conditions see Doukhan [13] and Bradley [5]. The absolute 
regularity coefficients are defined as 

(3(k) := supEsup^POVJ^J - P(A)\ : A G J^J, 

neIN 

We say that (X n ) ng]N is strongly mixing if lim^oo a(k) = and absolutely regular if 
linifc^oo /3(k) = 0. Absolute regularity is a stronger assumption than strong mixing, as 
a (k) <f3(k). 

We will consider strongly mixing sequences and functionals of absolutely regular sequences. 
Let (Z n ) ngZ be a stationary sequence of random variables satisfying the absolute regularity 
condition f3 (k) — > as k — > oo. We call a sequence (X n ) ng]N a one-sided functional of 
{.Z n ) n&m if there is a measurable function / : — > R such that 

Xn — f((.Z n+k ) k > Q ). 

In addition we will assume that (X n ) ngM satisfies the r-approximation condition: 

Definition 1.4. Let be r > 1. We say that {X n ) n( - Z satisfies the r- approximating condition 
with constants (ct„) ngM if 

\\X 1 -E(X 1 /^)\\ r <a l 1 = 0,1,2... 

where lim^oo a\ = and T\ is the a— field generated by Zq, . . . ,Zi and \\Y\\ r = (E \Y\ r )^ . 
Example 1.5. Let be (Z„,) ngM be independent with P [X n = 1] = P [X n = 0] = \ and 

OO j 

k=n 
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Note that (X n ) ngM is a deterministic sequence, as X n+ i = T (X n ) := 2X n mod 1. Thus 
{X n ) n£K is not strongly mixing, but nevertheless this sequence satisfies the r- approximating 
condition for every r > 1, as 



1*1 - -Epfi/J"; 



E 

k=l+l 



2 k+l 



z. 



< 



E 

k=l+l 



1 



1 

2 1 



ai. 



[/-statistic have not only been studied for i.i.d. data, but also under various mixing 
conditions. While under independence, the summands of U n (h 2 ) are uncorrelated, they 
can be correlated if the random variables (X n ) nem are dependent, so one has to establish 
generalized covariance inequalities to derive moment bounds for U n (h 2 ). 

Under the strong assumption of ^-mixing and the existence of 4th moments, Sen [34] 
showed that y/nU n (h 2 ) — > a.s.. Yoshihara [36] weakened this to absolutely regular pro- 
cesses. Convergence to zero in probability of \pr\XJ n (h 2 ) was proved by Denker and Keller 
[12] for functionals of absolutely regular processes and by Dehling and Wendler [11] for 
strongly mixing sequences. The convergence of y/nU n (h 2 ) together with the Central Limit 
Theorem for partial sums can be used to prove the asymptotic normality of nondegenerate 
[/-statistics. 

In 1961, Hoeffding showed that U n (h 2 ) — > a.s. for independent observations. If h is 
continuous, this holds under the minimal assumption that (X n ) ngIN is ergodic, as Aaronson 
et. al. [1] have proved. We give better rates of convergence for absolutely regular sequences, 
strongly mixing sequences and functionals of absolutely regular sequences. We will apply 
moment inequalities and the method of subsequences. Together with the LIL for partial 
sums, this will imply LIL for [/-statistics. 

For independent data, second moments of the kernel are required. For mixing data, one 
needs higher moments: 

Definition 1.6. Let (X n ) ngIN be a stationary process. A kernel h has uniform m-moments, 
if for all k G JN 

J J \h(x u x 2 )\ m dF(x 1 )dF(x 2 ) < M, 

\h(x 1 ,x k )\ m dP(x 1 ,x k ) < M. 

In the case of strong mixing and functionals of absolutely regular processes, one needs also 
a continuity condition. We consider the P-Lipschitz condition (see Dehling, Wendler [11]) 
and the variation condition introduced by Denker and Keller [12]: 

Definition 1.7. (1) A kernel h is called P-Lipschitz- continuous with constant L > if 

E [\h (X, Y)-h (X', Y) I t { \x-x>\<e}] < Le 

for every e > 0, every pair X and Y with the common distribution Vx 1 ,x k for a k G IN 
or Vxx x Pxi and X' and Y also with one of these common distributions. 
(2) A kernel h satisfies the variation condition, if there is a constant L such that 



E 



sup \h(x,y) — h(x' ,y')\ 

(x,y)-(X,Y)\\<e, \\(x> ,y')-(X,Y)\\<e 



< Le, 



where X, Y have the common distribution Vx 1 x "Px\ an d \\{xi,x 2 ) 
denotes the Euclidean norm. 



{xj + xj) 1 / 2 
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Example 1.8. Let h(xi,X2) = \x\ — x 2 \ ■ As this kernel is Lip schitz- continuous, it is clear 
that it satisfies the P '-Lip schitz- condition and the variation condition. 

Example 1.9. Let h(x 1 ,x 2 ) = L (^-{ Xl <t} — t) (^-{ X2 <t} — t) dt. This kernel is uniformly 
bounded by 1 and P- Lip schitz- continuous with constant 1, as 



=E 
<E 



E[\h(X,Y) - h(X',Y)\l { \ X - X >\<e}] 
l 

{^{x<t} — l{x'<t>) (l{y<*} — t) dt 



(1 



{X<t} 



{X'<t} 



) dt 



{\X-X'\<e} 



M\X-X>\<e} 
= E[\X-X'\l { \ X - X ,\< e} ] <€. 



Example 1.10. Let feiGR and h (£1,22) = l|i( xi+a . 2 )< t j ■ Then 



sup 

\\(x,y)-(X,Y)\\<e 
\(x',y')-(X,Y)\\<e 



In 



{\{*+y)<t} 1 {\{x'+y')<t} 



else 



If Xi has a bounded density, then the density fir x+Y ) °f \ {X + Y) is also bounded, where 
X, Y are independent random variables with the same distribution as X±. Then 



E 



sup \h (x, y) — h (x' , y')\ 

(x,y)-(X,Y)\\<e, \\(x' ,y')-(X,Y)\\<e 



< P 



and h satisfies the variation condition. 



X + Y 



e t 



y/2' 



e 

71. 



<(v^up/, (x+r) ). £ 



Remark. The two continuity conditions are close in spirit. The main difference is that 
one has to consider all common distributions of X, Y for checking P-Lipschitz continuity 
(that can be difficult), but only the replacement of one of the arguments of h, while in the 
variation condition, both arguments of h are replaced, but only the case that X and Y are 
independent has to be considered. 



2. Main Results 

Theorem 1. Let (X n ) n£]M be a stationary process and h 2 a degenerate, centered kernel with 
uniform (2 + 5) -moments for some 5 > 0. Let r > be such that one of the following three 
conditions hold: 

(1) (X n ) ngW is absolutely regular and J2k=o^^^ (^) = 0(n T ). 

(2) (X n ) nl - m is strongly mixing, P|Xi| 7 < 00 for a 7 > 0, h 2 satisfies the P-Lipschitz- 

continuity or the variation condition and ^2 k=0 ka^ s+& +^+ 2 (k) = O {n T ). 

(3) (X n ) n£m is a 1- approximating functional of an absolutely regular process and h 2 sat- 
isfies the P -Lip schitz- continuity or the variation condition. For = a « ; 
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Then: 



n 1 " 



(1) — U n (h 2 ) ^ 

log 2 n log log n 

Remark. Since /3(k) < 1, condition (1) in Theorem 1 is always satisfied with some r G [0, 2]. 
In the extreme case when r = 2, the conclusion of Theorem 1 is trivial, since U n (h2) —> by 
the [/-statistic ergodic theorem for absolutely regular processes, established by Aaronson et 
al. [1]. In the other extreme case r = 0, i.e. when the series YlT=i k ^^(k) converges, the 
conclusion of Theorem 1 is close to the optimal rate which follows in the independent case 
from the LIL of Dehling, Denker and Philipp [10]. 

Theorem 2. Let (X n ) n£¥f be a stationary process and h a centered kernel with uniform 
(2 + 8) -moments for some 5 > 0. Let e > be such that one of the following three conditions 
hold: 

(1) (X n ) n€JN is absolutely regular and Ylk=o^(^^ (^) = 0(n l ~ t ). 

(2) (AT n ) ng]N i> s strongly mixing, E |ATi| 7 < oo for a 7 > 0, h 2 satisfies the P -Lipschitz- 

2-fS 

continuity or the variation condition and X^ =0 ka 3 ~* s+s + 5 ~*+ 2 (k) = O (n 1 ^ 6 ). 

(3) (X n ) ngW is a 1- approximating functional with constants (a n ) ne N of an absolutely 

regular process with mixing coefficients safisfying (3 (n) = O (^n 8 <^ 336 j and for 

5 

a L — v2~SiSi a i : Ylk=o^ a k +S = O {n 1 ~ e ). h 2 satisfies the P -Lipschitz- continuity or 
the variation condition, {hi (X„)) n6W is (2 + 5) -approximating with constants (6 n ) ng]N , 

such that b„ = O ( n -2 ^ 



// additionally a 1 ^ := Var [hi (Xq)] + 2 Y^=i Gov [hi (Xq) , hi (Xi)] > 0, then the LIL holds 
for T n = Ei<i<i< n h ( X *> X i) ■ 

3. An application to robust estimation 

The classical approach to estimate the location of a sequence of random variables (X n ) ngM 
is based on the sample mean X = ^ X^=i^i> but this estimator is not robust in the sense 
that a single extreme value can have a big influence on X. The median of X\,... ,X n is 
robust to outliers, but has a low efficiency if the X n are standard normal. As a compromise, 
one can use a trimmed mean or the Hodges-Lehmann estimator 

f X * + X i 1 
H n = median < 1 1 < i < j < n 

The Hodges-Lehmann estimator can be expressed with the generalized inverse of the empir- 
ical [/-distribution function 

Hn = U~ 1 (I) :=inf(tGR|Z7„(t)>i 



with 



Let U(t) = E 



{\{Xi+X 3 )<t\- 
l<i<j<n 



1 {l(X+Y)<t} 



where X and Y are independent. If U (t) is strictly increasing 



and continuous, we can without loss of generality assume that U (t) = t for t G [0,1]. For 
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functionals of absolutely regular processes, Borovkova, Burton and Dehling [4] have proved 
the convergence of the emperical [/-process 

{VH(u n (t)-u(t))) tm 

to a Gaussian process. By Theorem 1 of Vervaat [35] , the same holds for the inverse process 
(\fn(U~ 1 (t) — £'6 r ~ 1 (t))) t , so H n is asymptotically normal. Our aim is to prove the LIL 
for H n . First note that H n is smaller than to = U^ 1 (|), iff U n (t ) is bigger than |. This 
converse behavior motivates a generalized Bahadur representation 

(2) H n = h ^ + Rn, 

where we need to assume that U' (to) > (so U (t) is invertible in a neighborhood and 
U (to) — §)■ The following short calculation shows that the remainder R n is related to the 
inverse of the empirical [/-process centered in (to, U n (t )). We define: 



Z n (x) :=(U n (- + to)-U n (t )r 1 (x) 



x 



U' (to) 
and observe that 

x 



Z n (x) = inf {s\U n (s + t ) - U n (t ) < x) 



U> (to) 

M{s\U n (s)<x + U n (to)} - ^-y - t = U~ l (x + U n (t )) - ^-y - t . 



Thus we finally get 



Z n (U (t ) - U n (to)) = H n -t + Un ^ J (to) = Rn. 



By Theorem 2, U (to) — U n (to) = O ( \j og ° gn ) a.s., so if we can show that for any constant 
C 



(3) sup (U n (to + t)-U n (to) -U(to + t) + U (to)) = o\\l l0gl ° gn 1 a.s. 



then by Theorem 4 of Vervaat [35] 



n 



7 f+\ I / l0 S l0 g n 1 

sup Z n (t) = o W a.s. 



and hence 



n 



R n = Z n (U (to) - U n (to)) = o ( J l ° g l ° gn j a.s. 

The LIL for H n follows then easily from the Bahadur representation (2) and the LIL for 
U n (to). We will only sketch the proof of (3). U n (t) and U (t) are nondecreasing, so for 
t 1 <t < t 2 : 

\U n (t) -U(t)\< max{|[/„ (h)-U (t)\ , \U n (t 2 ) - U (t)\} 

< max{\U n (h) - U (h)\ , \U n (t 2 ) - U (t 2 )|} + (U (t 2 ) - U (h)) 
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Furthermore, U (t) is differentiable in t , so (U (t 2 ) — U (tij) = O (t 2 — ti) as tx, t 2 —> t and 
for every e > we can find a K such that 



/ " ~ f .s . / n / fcC /log log n\ 

sup J- — Z n (t) < max./- — Z n \—\ + e. 

\t\< C y/&&E V ioglogn |fc|<^ V log log y K V ra J 

Z n (^-\f^^^j is a ^-statistic with kernel l| i(x . +x ^ 6 ^ o to+ &cy^iM^|; which has decay- 



ing moments. Similar to Theorem 2, one can show that Z n I ^ 
a.s. if the mixing assumption (3) of Theorem 2 holds. 

4. Preliminary results 

To control the moments of degenerate [/-statistics, we need bounds for the covariance. 
In the following three lemmas, let m = max {«( 2 ) — i(i),*(4) — &(3)}, where i 2 , 13, 14} = 
{«(i), 2(2), 2(3), 2(4)} and < i (2) < i (3) < i (4) : 

Lemma 4.1 (Yoshihara [36]). Let /12 be a centered, degenerate kernel with uniform (2 + 5)- 
moments for a 6 > 0. If (X n ) nl - m is absolutely regular, then there is a constant C such 
that 

\E [h 2 (X H ,X l2 ) h 2 {X ia ,X u )]\ < C{3^ ( m ) . 

Lemma 4.2. Let h 2 be a centered, degenerate kernel that satisfies the P '-Lip schitz- continuity 
or the variation condition and has uniform (2 +5) -moments for a 5 > 0, (X n ) neJN a stationary 
sequence of random variables. If there is a 7 > with P|Xfc| 7 < 00, then there exists a 
constant C, such that the following inequality holds: 

\E [h 2 (X tll X l2 ) h 2 (X i3 ,X iA )]\ < Cqw +w ( m ) 

This lemma is due to Dehling, Wendler [11] for P-Lipschitz-continuous kernels. The proof 
under the variation condition is very similar and hence omitted. 

Lemma 4.3. Let h 2 be a centered, degenerate kernel that satisfies the P- Lip schitz- continuity 
or the variation condition and has uniform (2 + S)-moments for a 5 > 0, and (X n ) ngW a 
1- approximating functional of an absolutely regular process with constants a\. Define as 
cil = \Y 2 Y^Ll a i an d P CO as mixing coefficient of (Z n ). Then: 

\E[h 2 (X h ,X i2 )h 2 (X h ,X u )]\<Cf3^ (Lf j) +Caffj 

Proof. First, let h 2 be P-Lipschitz-continuous. For simplicity, we consider only the case 
O = i\ < i 2 < ?3 < 14 and m = i 2 — i\ > i± — 23. With Corollary 2.17 of Borovkova et. al. 
[4], there exist sequences (X' n ) neZ and (^C')nez with the same distribution as (X n ) neZ , such 
that 

(!) K) ne z is independent of (X n ) neZ , 



(2) P 

(3) P 



EZJXi ~ Xi\ > a m <« LfJ +/3(LfJ) 
E^o \X-i- x -i\ > a Y- 



< Ct\m. j. 
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As h 2 is degenerated and X" and (X i2 , X is , X i4 ) are independent, we have that 

E[h 2 {X^,X i2 )h 2 (X i3 ,X i4 )] = 0, 

so we can now write 

\E[h 2 {X h ,X i2 )h 2 (X i3 ,X u )}\ 
= \E [h 2 {X' h ,X' i2 ) h 2 {X' i3 ,X' iA ) - h 2 {Xl[,X i2 ) h 2 (X i3 ,X i4 )] | 
< \E [h 2 (X' h ,X' i2 ) h 2 {X' i3 ,XQ - h 2 {Xl[,X' i2 ) h 2 {X' i3 ,XQ] | 

+ \E [h 2 (XlX' i2 ) h 2 (X' l3 ,X' u ) - h 2 {XlX i2 ) h 2 {X' i3 ,XQ) | 

+ \e [h 2 (x'! v x i2 ) h 2 (x;,x; 4 ) - h 2 (x^x l2 ) h 2 (x t3 ,x; 4 )] | 

+ \E [h 2 [Xl[,X i2 ) h 2 [X i3 ,X-J - h 2 (X-[,X i2 ) h 2 (X i3 ,X i4 )] | . 
In order to keep this proof short , we treat only the first of the four summands. Define 

( h 2 (x,y) if\h 2 (x,y)\< VK 
h 2 ,K (x, y) = < \[K if h 2 (x,y) > y/K 
( -VK if h 2 {x,y) < -VK 
It is clear that h 2 ^K is P-Lipschitz-continuous, too. We get that 

\E [h 2 {X'^XQ h 2 {x> 3 ,xQ - h 2 (x'J v x' i2 ) h 2 [x'^xQ]] 
= \E [(h 2 (X'^XQ - h 2 (X^XQ) h 2 (X> 3 ,XQ] | 

<E | (h 2tK {X' h ,X' i2 ) - h 2)K {X" v X' i2 )) h 2>K {X' i3 ,X' iA )\ In , „ i< ) 

1 1 n n I — L-g-J j _ 

+ E \\ (h 2 , K (x^xQ - h 2 , K (x^xQ) h 2 , K (x' i3 ,xQ\ tn x , _ x „ , } 
+ e [\h 2 , K (x' h ,x' i2 ) h 2 . K (x;,x; 4 ) - h 2 (x' il} x' i2 ) h 2 (x' i3 ,xQ\] 

+ E [\h 2iK {xlx' i2 ) h 2 , K {x' i3 ,xQ - h 2 {X^XQ h 2 {x' i3 ,xQ |] 



Because of the P-Lipschitz-continuity and \h 2 ^K (X3, X' 4 ) 
smaller than 2Le\fK. By property 3 of (X^) n6Z and (X") 
by 



< \>K, the first summand is 
cZ , the second term is bounded 



P 



|X" -X'\> 



2K < 2a [Sli K. 



As h 2 {X' h ,X' i2 ) h 2 {X' i3 ,X' iA ) and h 2 (X-',X- 2 ) h 2 (X- 3 ,X^J are random variables with (1 + 
|)-moments smaller than M from the definition of the uniform (2 + 5)-moments, the third 
and the fourth summand are bounded by Totally, we get 

\E [h 2 (X' iv X' h ) h 2 [X' i3 ,X' iA ) - h 2 (X'{ v X' i2 ) h 2 (X; 3 , X' u )] | < 2LeVK + 2a [f } K + 2-^. 

K 2 

2 

Setting K = ^a^j + (|_f_l)) 2+& M^+s, keeping in mind that this K is nondecreasing and 
treating the other three summands in the same way, one easily obtains 



\E[h 2 (X h ,X i2 )h 2 (X i3 ,X u )]\ <CM—s (pw 



L-J 

L 3 J 
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for a constant C, which proofs the lemma for a P-Lipschitz-continuous kernel. Let now h 2 
satisfy the variation condition. Obviously, the same holds for h 2t K and 

\E [h 2 (Xi^Xi 2 ) h 2 (Xi 3 , Xi A )]\ 

< \e [h 2 (x;,x;) h 2 {x' i3 ,x' u ) - h 2 {x^x l2 ) h 2 (x;,x; 4 )] | 

+ \E [h 2 (Xi;,X i2 ) h 2 {X' i3 ,XQ - h 2 {X^,X i2 ) h 2 (X l3 ,X u )} I . 
Again, we concentrate on the first summand. By the variation condition, we have that 



E 



(h 2 , K (x> h ,x> 2 ) - h %K (X^X i2 )) h 2 , K (X' i3 ,XQ\ In , 



: \[ke 



sup \h 2jK (x,y) - h 2)K (x',y')\ 

\{x,y)-{X'^,X i2 )\\<V2 ai rn^ \\(x',y')-(X(' 1 ,X i2 )\\<V2a [ rn i 

< 2 v / 2v / KLe. 



As P 
that 



\X^[ -X h \ > a l in l 



< Qt|_mj, P 



\X' i2 -X i2 \ > a L mj 



< a LfJ +/3([fJ), it follows 



E 



{h 2 ,K (X'^XQ - h 2 ,K (X-[,X i2 )) h 2 , K [X' i3 ,X-J I In 



X'.-X'J >a imi , X'.-XiJ>a 



< P 



K - X' h \ > a L?J) \X l2 - X' l2 \ > a L?J ] 2K < 4 (a L ~ j + P K. 



The rest of the proof is the same as above. 



□ 



Yoshihara [36] deduced the following moment bound under condition (1) with the help of 
Lemma 4.1. The result follows from condition (2) and (3) in the same way using the Lemmas 
4.2 and 4.3 instead. 

Lemma 4.4. Let (X n ) ngW be a stationary process and h 2 a degenerate, centered kernel with 
uniform (2 + 5) -moments for a 5 > 0. Let be r > such that one of the following three 
conditions hold: 

s 



'new 



is a 



bsolutely regular and ^22=0^^ 2+6 (^) = ^ ( nT )- 



(2) (A A n ) ngM is strongly mixing, E {Xi]" 1 < 00 for a 7 > 0, h 2 satisfies the P -Lipschitz- 

2-yS 

continuity or the variation condition and X]fc=o ka~< 5 + 5 + 5 ~<+ 2 (k) = O (n T ). 

(3) (X„) neK is a 1- approximating functional of an absolutely regular process and h 2 sat- 
isfies the P -Lip schitz- continuity or the variation condition. For = ^2 Yli^L 



Then 

n 

\E[h 2 (X h ,X l2 ) h 2 (X i3 ,X i4 )]\=0 {n 2+T ) . 

Lemma 4.5. If h satisfies the P -Lip schitz- continuity or the variation condition, then the 
condition holds also for h 2 . 
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Proof. For P-Lipschitz-continuous kernels, we refer to Dehling, Wendler [11], proof of Lemma 
3.3. Let now h satisfy the variation condition. As h 2 (x, y) = h (x, y) — hi (x) — hi (y) — 9, 
it suffices to verify this condition for hi. Recall that hi (x) = E [h (x, Y)] — 6, so 



E 



--E 



<E 



<E 



<E 



sup \hi (x) — hi (x') 

(x,y)-(X,Y)\\<e, \\(x' ,y')-(X,Y)\\<e 



sup \E[h(x,Y)\ -E[h(x',Y)]\ 

\x-X\<e, \x'-X\<e 



sup E\h(x,Y) -h(x',Y)\ 

\x-X\<e, \x'-X\<e 



\h(x,Y) -h(x',Y)\ 



sup 

\x-X\<e, \x'-X\<e 



sup \h(x,y) — h(x' ,y')\ 

(x,y)-(X,Y)\\<e, \\{x' ,y')-(X,Y)\\<e 



< Le. 



□ 



5. Proofs of the theorems. 
Proof of Theorem 1. : We define 

l<i\<i2<n 

1 

n 1+ 2 log 2 n log log n 
With the method of subsequences, it suffices to show that 

(4) a 2l Q 2l (h 2 ) ^ 

(5) max \a n Q n - a 2 i-iQ 2 i-A ^> ® 

2 l ' 1 <n<2 1 

as I —> oo. We use the Chebyshev inequality and Lemma 4.4 to prove the first line. For 
every e > 0: 

OO OO oo ^ 

P [\a*Q* (h 2 )\ >e}<-J24E [Q% (h 2 )} < C- £ : < ™ 
i=i e i=i e i=i l ~ 2lo & 1 

(4) follows with the Borel-Cantelli Lemma. To prove (5), we first have to find a bound for 
the second moments, using a well known chaining technique. For example, by the triangle 
inequality we have 

|«15Ql5 — &d,Qd\ < |Ol5Ql5 — OuQuI + |Ol4<5l4 — Ol2<5l2| + |0-12Ql2 — d%Q&\ • 
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Using such a decomposition for all n with < n < 2 l , we conclude that 
max \a n Q n — a 2 i-iQ2i-i\ 

2 l - 1 <n<2 1 
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< \ max \a 2 l-l + i2d-lQ2 l - 1 +i2 d - 1 ~ a 2 l - 1 +(i-l)2 d - 1 Q2 l - 1 + (i-l)2 d - 1 \ ■ 
* — ' i=l 2 l ~ d 



d=l 



As for any random variables Yi, . . . , Y n : E (max 

\ Y i\) <J2 EY ?> ^ follows that 



E 



max |ctnQn — a 2 z-iQ 2 ;-i 

2 ; - 1 <n<2 i 



j 2< 



< ^^^-^ |^( a 2 i - 1 +i2 d - 1< 52 i - 1 +i2 d - 1 — a 2 i - 1 +(i-l)2< i - 1< 52 I - 1 + (j-l)2 ti - 1 ) 

d=l i=l 
J 2'" d 

< Z^^E [(a 2 !-l + i 2 d-l (Q 2 l - 1 +i2 d - 1 - <52 ! - 1 + (i-l)2 d - 1 ) 

d=l i=l 

+ (a 2 !-i +i2 d-i — a 2 i-i + (j„!) 2 d-i) Q 2 i-i + (j_!) 2 d-i) 

i 2 l - d 

< I ^ 2a 2 !-l +i2 d-l-E (Q2 l ~ 1 +i2 d - 1 - Q2 l - 1 + {i-l)2 d - 1 ) 
I 2 l - d 

+ / ^ ^ 2 (a 2 ;-i +i2 d-i — a 2 i-i + ( i _ 1 ) 2 d-i) E Q 2 (-i +(i _ 1)2 d 



d=l i=l 



d=l i=l 



y^2a^_ 1+ . 2d _ 1 E 



cf=i 



^ (Q2 ! - 1 + t2 d - 1 - Q2 l - 1 + {i-l)2 d - 1 ) 



t=l 



+ J ^ ^ 2 (a 2 !-i +i2 d-i + a2 i - 1 +(i-i)2 d ~ 1 ) (o 2 i-i+j 2 d-i — a 2 i-i + (j_!) 2 d-i) _E Q^i-i 



d=l i=l 



L +(i— l)2 rf " 



. .~ n /log / 

In the last line we used the fact that the sequence (|a|n) ne iN is decreasing and Lemma 4.4. 
It now follows for all e > with the Chebyshev inequality 



i=i 



max \a n Qn ~ o 2 !-iQ 2 i-i| > e 

2'-!<n<2 ; 



< 



oo 



i=l 



max \a n Q n — e^f-iC^- 1 

2'~ 1 <n<2 i 



< 



Cy^ 1 



< oo, 



the Borel-Cantelli Lemma completes the proof. 
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□ 



Proof of Theorem 2. : We give the proof the theorem only under condition (1) and omit the 
similar proofs under conditions (2) and (3) (where Lemma 4.5 is used to conclude that hi is 
P-Lipschitz-continuous and Rio's result [33] has to be replaced by the result of Philipp and 
Stout [31, chapter 7] under condition (3)). 

First note that E\h\ (X n 
standard arguments 



1 2+6 



-Var 



■/?. 



5>i (X, 
_i=i 

an by Lemma 4.4 Var [U n (^2)] 0. So we have 
(6) Var 



< E\h(X, Y)\ 2+& < 00 (X, Y being independent). By 

00 

al = Var [h (X )} + 2 ^ Cov [h (X ) , h x (X,)] 



i=l 



i<*<i<« 



(7) 



Var 



E 



i<«<?< 



n h(Xi,Xj 



Var [(n - 1) Eti hi (*«)] 



-7- OO 



^ 1. 



As (/3(n)) ng]N is nonincreasing and Efc =0 ^ 2+ ' 5 (&) — O ( n e ); it follows that 



2+6 



) fc w a (fc) < fc ^ 0(k) < 00, 



k=l 



k=l 



so by Theorem 2 of Rio [33] the LIL holds for ^™ =1 /ii {Xj). By Theorem 1 and Line 7, this 
holds also for J2i<i<j< n h ( X i, Xj). 



□ 
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